feat: batch video generation endpoint by ryanontheinside · Pull Request #399 · daydreamlive/scope

ryanontheinside · 2026-02-04T19:25:33Z

Summary

Adds /api/v1/batch — an HTTP batch video generation API with SSE progress streaming. Processes video chunk-by-chunk with file-based binary transfer for inputs and outputs.

Primary consumer is the ComfyUI custom nodes (comfyui-scope), but the endpoint is client-agnostic.

Endpoints

Endpoint	Method	Purpose
`/api/v1/batch`	POST	Generate video (SSE stream)
`/api/v1/batch/cancel`	POST	Cancel after current chunk
`/api/v1/batch/upload`	POST	Upload input video for v2v
`/api/v1/batch/upload-data`	POST	Upload binary data blob (VACE frames/masks, per-chunk video)
`/api/v1/batch/download`	GET	Download output video

Key design decisions

Why not use the existing WebRTC streaming path?

Maximum quality and accuracy is the objective. WebRTC compression, frame dropping, and streaming-specific infrastructure are not only irrelevant but contrary to the goal. Batch generation needs lossless frame transfer and deterministic chunk processing.

Why a single binary blob with chunk spec offsets?

The ChunkSpec + blob approach allows a single generation request to dynamically mix t2v, v2v, VACE conditioning, and VACE masking on a per-chunk basis. A client packs all binary data (VACE frames/masks, per-chunk input video) into one blob upload, then references regions by byte offset in each ChunkSpec. This avoids multiple upload round-trips and keeps the JSON request clean.

Why SSE instead of WebSocket?

Progress reporting is unidirectional (server → client). SSE is plain HTTP — no connection upgrade, works with any HTTP client, trivial to parse. WebSocket would add complexity for no benefit.

Why server-side temp files instead of in-memory?

Video tensors can be gigabytes. Streaming output to disk chunk-by-chunk keeps memory bounded to one chunk at a time rather than accumulating the full video in memory.

Why module-level threading primitives?

Pipeline inference is synchronous and GPU-bound. The generation runs in a sync generator consumed by FastAPI's StreamingResponse. Async locks would require restructuring the pipeline call path for no gain. Single-client constraint (one generation at a time, 409 if busy) is enforced via threading.Lock.

Changes

File	What
`generate.py` (new)	Core generation engine: input decoding, `ChunkSpec` → pipeline kwargs, SSE streaming, processor chaining, temp file lifecycle
`schema.py`	`GenerateRequest`, `ChunkSpec`, `GenerateResponse`, upload/download response schemas
`app.py`	Five new endpoint handlers + concurrent generation guard + blob size limit
`pipeline_processor.py`	`batch_mode` for queue-based chunk processing (reuses existing processor chaining)
`recording.py`	Temp file prefixes for generate input/output/data
`docs/api/generate.md`	Binary protocol documentation
`scripts/test_generate_endpoint.py`	Manual test script covering LoRA ramp, v2v, VACE conditioning, inpainting

Per-chunk control via `ChunkSpec`

Every generation parameter can be overridden per-chunk. Only fields that change need to be specified — prompts are sticky (last-set value persists). Example:

{
  "pipeline_id": "longlive",
  "prompt": "a cat walking",
  "num_frames": 96,
  "seed": 42,
  "chunk_specs": [
    {"chunk": 0, "lora_scales": {"/path/to/lora.safetensors": 0.0}},
    {"chunk": 3, "text": "a cat jumping", "lora_scales": {"/path/to/lora.safetensors": 0.5}},
    {"chunk": 5, "vace_frames_offset": 0, "vace_frames_shape": [1, 3, 12, 320, 576]}
  ],
  "data_blob_path": "<from /batch/upload-data>"
}

Limitations / follow-ups

Chunk size is determined server-side; clients cannot query it before generating (separate PR planned to expose this)
Single-instance only (upload paths are local filesystem references)

Test plan

# Add batch video generation endpoint with SSE streaming ## Summary Adds `/api/v1/generate` endpoint for batch video generation with server-side chunking and SSE progress streaming. Supports text-to-video, video-to-video, VACE conditioning, and comprehensive per-chunk parameter scheduling. This is important for the ComfyUI node wrapper for Scope. It also could conceivably replace the test.py/test_vace.py, or at least their boiler plate code. ## Changes - **`schema.py`**: Add `GenerateRequest`/`GenerateResponse` models with `EncodedArray` for binary data - **`generate.py`**: New module handling chunked generation with SSE progress events - **`app.py`**: Wire up the endpoint - **`test_generate_endpoint.py`**: Integration tests for v2v, depth, inpainting, LoRA ramps - **ComfyUI nodes**: Update `ScopeSampler` to use new schema ## Features ### Generation modes - **Text-to-video**: Generate from prompt alone - **Video-to-video**: Transform input video with configurable noise scale ### VACE conditioning - **Reference images**: Style/identity conditioning via image paths - **Depth/structure guidance**: Pass conditioning frames for structural control - **Inpainting**: Binary masks specify regions to regenerate vs preserve ### Per-chunk parameter scheduling All scheduling parameters accept either a single value (applied to all chunks) or a list (applied per-chunk, last value repeats if list is shorter than chunk count). | Parameter | Type | Description | |-----------|------|-------------| | `seed` | `int \| list[int]` | Random seed per chunk | | `noise_scale` | `float \| list[float]` | V2V noise injection strength | | `vace_context_scale` | `float \| list[float]` | VACE conditioning influence | | `lora_scales` | `dict[str, float \| list[float]]` | Per-LoRA strength scheduling | ### Sparse keyframe updates These parameters use a chunk-indexed specification, only sending updates when values change (sticky behavior). | Parameter | Type | Description | |-----------|------|-------------| | `chunk_prompts` | `list[{chunk, text}]` | Prompt changes at specific chunks | | `first_frames` | `list[{chunk, image}]` | First frame anchors for extension mode | | `last_frames` | `list[{chunk, image}]` | Last frame anchors for extension mode | | `vace_ref_images` | `list[{chunk, images}]` | Reference images at specific chunks | ## Design decisions Some features were left out of this PR for simplicity (eg, prompt spatial/temporal blending). They can be added or included in a follow up. ### SSE streaming Clients, like test files or ComfyUI nodes, need performance and progress updates. SSE provides per-chunk progress updates without requiring WebSocket infrastructure: ``` event: progress data: {"chunk": 1, "total_chunks": 8, "fps": 4.2, "latency": 2.85} event: progress data: {"chunk": 2, "total_chunks": 8, "fps": 4.5, "latency": 2.67} event: complete data: {"video_base64": "...", "video_shape": [96, 320, 576, 3], ...} ``` ### Server-side chunking The server determines chunk size from the pipeline, handles frame padding, and manages KV cache initialization. Callers specify total frames and per-chunk parameters—the server handles the rest. ## Example usage ### LoRA strength ramp (dissolve effect) ```python request = GenerateRequest( pipeline_id="longlive", prompt="a woman dissolving into particles", num_frames=96, # 8 chunks × 12 frames lora_scales={ "path/to/dissolve.safetensors": [0.0, 0.15, 0.3, 0.5, 0.7, 0.85, 1.0, 1.0] }, ) ``` ### Video-to-video with prompt changes ```python request = GenerateRequest( pipeline_id="longlive", prompt="a cat sitting calmly", chunk_prompts=[ {"chunk": 3, "text": "a cat jumping"}, {"chunk": 6, "text": "a cat landing gracefully"}, ], input_video=EncodedArray(base64="...", shape=[96, 512, 512, 3]), noise_scale=0.6, ) ``` ### Depth-guided generation ```python request = GenerateRequest( pipeline_id="longlive", prompt="a robot walking through a forest", vace_frames=EncodedArray(base64="...", shape=[1, 3, 48, 320, 576]), vace_context_scale=1.5, ) ``` ## Test plan - [x] `uv run daydream-scope` starts without errors - [x] V2V generation produces correct output - [x] VACE depth conditioning works - [x] VACE inpainting with masks works - [x] LoRA scale ramping works across chunks - [x] Per-chunk noise scale scheduling works - [x] Prompt keyframing updates at correct chunks - [x] ComfyUI ScopeSampler node works (WIP) - [x] Test with Longlive - [x] Same test with StreamDiffusionv2 Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

enables rife Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

- Reuse RecordingManager temp file pattern for large video I/O - Add POST /generate/upload and GET /generate/download endpoints - Write output chunks incrementally to disk (constant memory) - Add generate_input/generate_output prefixes to TEMP_FILE_PREFIXES Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

ryanontheinside requested a review from yondonfu February 4, 2026 19:25

ryanontheinside force-pushed the ryanontheinside/feat/generate-endpoint branch from c2b5afb to 50e33a1 Compare February 4, 2026 19:31

ryanontheinside marked this pull request as draft February 5, 2026 20:02

ryanontheinside force-pushed the ryanontheinside/feat/generate-endpoint branch from e1af42b to 1f74776 Compare February 6, 2026 18:49

ryanontheinside force-pushed the ryanontheinside/feat/generate-endpoint branch from 1f74776 to a94212c Compare February 13, 2026 20:55

ryanontheinside added 14 commits February 20, 2026 09:29

remove edge case padding

f626b5b

enables rife Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

rm longliveloadparams

4e38e70

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

move scripts

7f5c9fd

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

add noise controller, bias, use vace input, interpolation method

4791f8e

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

cancellation

c23446d

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

temp file cleanup

5e6a965

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

prompt blending

090a0ea

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

additional per chunk logging and tmp file cleanup

7741692

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

pre and post processors

67b0a95

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

per chunk vace spec

c61a9a1

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

wip

8f61a2c

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

cleanup

d2c66b9

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

ryanontheinside changed the title ~~feat: generate endpoint with SSE streaming~~ feat: batch video generation endpoint (/api/v1/generate) Feb 20, 2026

ryanontheinside force-pushed the ryanontheinside/feat/generate-endpoint branch from a94212c to d2c66b9 Compare February 20, 2026 17:00

ryanontheinside added 2 commits February 20, 2026 15:06

rm gc

59eb2ab

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

generate -> batch

ad34a65

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

ryanontheinside changed the title ~~feat: batch video generation endpoint (/api/v1/generate)~~ feat: batch video generation endpoint Feb 20, 2026

ryanontheinside requested a review from leszko February 23, 2026 13:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: batch video generation endpoint#399

feat: batch video generation endpoint#399
ryanontheinside wants to merge 16 commits intomainfrom
ryanontheinside/feat/generate-endpoint

ryanontheinside commented Feb 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

ryanontheinside commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Endpoints

Key design decisions

Why not use the existing WebRTC streaming path?

Why a single binary blob with chunk spec offsets?

Why SSE instead of WebSocket?

Why server-side temp files instead of in-memory?

Why module-level threading primitives?

Changes

Per-chunk control via ChunkSpec

Limitations / follow-ups

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ryanontheinside commented Feb 4, 2026 •

edited

Loading

Per-chunk control via `ChunkSpec`